Last updated: 2025-09-29
Checks: 5 2
Knit directory: PIPAC_spatial/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20240917) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
| absolute | relative |
|---|---|
| /home/hnatri/PIPAC_spatial/ | . |
| /home/hnatri/PIPAC_spatial/code/colors_themes.R | code/colors_themes.R |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 2221fdb. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .RData
Ignored: cell_annotation_top_markers_denoist.tsv
Ignored: cell_main_cluster_marker_annotations_denoist.tsv
Ignored: celltype_markers.tsv
Ignored: code/.Rhistory
Ignored: code/Proximity_analysis/.DS_Store
Ignored: code/Proximity_analysis/data/
Ignored: code/Proximity_analysis/output/
Ignored: code/RSC_latest_EDM_2025-08-06/
Ignored: code/construct_niches_tumor.Rout
Ignored: code/pairwise_proximity.Rout
Ignored: code/pairwise_proximity_chunk.Rout
Ignored: immune_cluster_marker_annotations.tsv
Ignored: immune_cluster_marker_annotations_2ndpass.tsv
Ignored: main_cluster_marker_annotations.tsv
Ignored: nonimmune_cluster_marker_annotations.tsv
Ignored: nonimmune_cluster_marker_annotations_2ndpass.tsv
Unstaged changes:
Modified: analysis/Xenium_preprocess_wholecell.Rmd
Modified: analysis/compare_cellular_nuclear_denoist_annot.Rmd
Modified: analysis/index.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/Xenium_preprocess_wholecell.Rmd) and HTML
(docs/Xenium_preprocess_wholecell.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 7c46eb3 | heinin | 2025-09-29 | Updated to denoised transcript data |
suppressPackageStartupMessages({
library(workflowr)
library(arrow)
library(Seurat)
library(SeuratObject)
library(SeuratDisk)
library(tidyverse)
library(tibble)
library(ggplot2)
library(ggpubr)
library(ggrepel)
library(workflowr)})
setwd("/home/hnatri/PIPAC_spatial/")
set.seed(9999)
options(scipen = 99999)
options(ggrepel.max.overlaps = Inf)
# Colors, themes, and cell type markers
#source("/home/hnatri/PIPAC_spatial/code/colors_themes.R")
# A simplified VlnPlot
BetterVlnPlot <- function(data, features, group_by, ylim = NA){
VlnPlot(data, pt.size = 0, features = features,
group.by = group_by, y.max = ylim) + labs(x = "") + NoLegend()
}
# DimPlots of before and after for each sample
DimPlotCompare <- function(tma){
bf_cells <- ncol(subset(merged_spatial_unfiltered, subset = TMA == tma))
a <- DimPlot(subset(merged_spatial_unfiltered, subset = TMA == tma),
reduction = "sp") + NoLegend() +
labs(title = paste0(tma, ", Unfiltered, ", bf_cells, " nuclei"))
aft_cells <- ncol(subset(merged_spatial, subset = TMA == tma))
b <- DimPlot(subset(merged_spatial, subset = TMA == tma),
reduction = "sp") + NoLegend() +
labs(title = paste0(tma, ", Filtered, ", aft_cells, " nuclei"))
ggarrange(a,b)
}
# /tgen_labs/banovich/xenium_run_folders/output-XETG00048__0034123__MR_PIPACTMA1__20240911_210055
# /tgen_labs/banovich/xenium_run_folders/PIPACTMA
data_dir <- "/tgen_labs/banovich/xenium_run_folders/PIPACTMA/"
tma_list <- c(
TMA1 = "output-XETG00048__0034123__MR_PIPACTMA1__20240911__210055",
TMA2 = "output-XETG00048__0034221__MR_PIPACTMA2-Rerun__20241107__213721",
TMA3 = "output-XETG00048__0024823__MR_PIPACTMA3-Rerun__20241115__191438",
TMA4 = "output-XETG00048__0033982__MR_PIPACTMA4__20240920__231140",
TMA5 = "output-XETG00048__0033981__MR_PIPACTMA5__20240920__231140",
TMA6 = "output-XETG00048__0034149__MR_PIPACTMA6__20240923__231239",
TMA7 = "output-XETG00048__0034152__MR_PIPACTMA7__20240923__231239",
TMA8 = "output-XETG00048__0033448__MR_PIPACTMA8__20241224__204450",
TMA9 = "output-XETG00048__0041382__MR_PIPACTMA9__20250109__205856")
# Get subdirectory names for obtaining file paths
subdirs <- unname(tma_list)
# Get transcript counts and metadata
all_files <- list.files(file.path(data_dir, subdirs), full.names = TRUE)
h5_files <- all_files[grep(".h5", all_files)]
transcript_files <- all_files[grep("transcripts.parquet", all_files)]
meta_files <- all_files[grep("cells.csv.gz", all_files)]
# Get sample IDs
tma_ids <- names(tma_list)
# Read in files
counts <- lapply(h5_files, Read10X_h5)
transcripts <- lapply(transcript_files, function(XX) {
read_parquet(XX) })
metadata <- lapply(meta_files, function(XX) {
tmp_meta <- read.delim(XX, sep = ",", colClasses = c(cell_id = "character"))
rownames(tmp_meta) <- tmp_meta$cell_id
tmp_meta })
# Rename files in lists
tma_ids <- unlist(lapply(str_split(meta_files, "__"), function(XX) { XX[3] }))
names(counts) <- tma_ids
names(transcripts) <- tma_ids
names(metadata) <- tma_ids
all_transcripts <- list()
nuc_transcripts <- list()
updated_metadata <- list()
nuc_updated_metadata <- list()
for (tma in tma_ids) {
message(paste("Getting cell counts for sample", tma))
# Filter out low quality transcripts
all_transcripts[[tma]] <- transcripts[[tma]][transcripts[[tma]]$qv > 20, ]
# Find transcripts that overlap a nucleus
#nuc_transcripts[[tma]] <- transcripts[[tma]][transcripts[[tma]]$overlaps_nucleus == "1", ]
# Create cell x gene dataframe
all_transcripts[[tma]] <- as.data.frame(table(all_transcripts[[tma]]$cell_id,
all_transcripts[[tma]]$feature_name))
names(all_transcripts[[tma]]) <- c("cell_id", "feature_name", "Count")
all_transcripts[[tma]] <- all_transcripts[[tma]] %>%
pivot_wider(names_from = "feature_name", values_from = "Count")
# Get blanks count per cell
blank_ids <- all_transcripts[[tma]]$cell_id
blank_mat <- all_transcripts[[tma]][, grep("BLANK",
colnames(all_transcripts[[tma]]))]
blank_counts <- as.data.frame(rowSums(blank_mat))
blank_counts$cell_id <- blank_ids
# Remove negative controls and convert to cell x gene matrix
all_transcripts[[tma]][, grep("NegControl",
colnames(all_transcripts[[tma]]),
invert = FALSE)]
all_transcripts[[tma]][, grep("BLANK",
colnames(all_transcripts[[tma]]),
invert = FALSE)]
all_transcripts[[tma]] <- all_transcripts[[tma]][, grep("NegControl",
colnames(all_transcripts[[tma]]),
invert = TRUE)]
all_transcripts[[tma]] <- all_transcripts[[tma]][, grep("BLANK",
colnames(all_transcripts[[tma]]),
invert = TRUE)]
keep_cells <- all_transcripts[[tma]]$cell_id
all_transcripts[[tma]] <- as.data.frame(all_transcripts[[tma]])
rownames(all_transcripts[[tma]]) <- keep_cells
all_transcripts[[tma]] <- all_transcripts[[tma]][, -1]
all_transcripts[[tma]] <- as.matrix(t(all_transcripts[[tma]]))
# Subset nuclear metadata to "cells" with transcripts that overlap nuclei
updated_metadata[[tma]] <- metadata[[tma]][metadata[[tma]]$cell_id %in% keep_cells, ]
# Add blank counts to metadata
updated_metadata[[tma]] <- full_join(updated_metadata[[tma]], blank_counts,
by = "cell_id")
updated_metadata[[tma]] <- updated_metadata[[tma]] %>%
dplyr::rename(num.blank = `rowSums(blank_mat)`)
rownames(updated_metadata[[tma]]) <- updated_metadata[[tma]]$cell_id
}
# Nuclear
for (tma in tma_ids) {
message(paste("Getting cell counts for sample", tma))
# Filter out low quality transcripts
nuc_transcripts[[tma]] <- transcripts[[tma]][transcripts[[tma]]$qv > 20, ]
# Find transcripts that overlap a nucleus
nuc_transcripts[[tma]] <- nuc_transcripts[[tma]][nuc_transcripts[[tma]]$overlaps_nucleus == 1, ]
# Create cell x gene dataframe
nuc_transcripts[[tma]] <- as.data.frame(table(nuc_transcripts[[tma]]$cell_id,
nuc_transcripts[[tma]]$feature_name))
names(nuc_transcripts[[tma]]) <- c("cell_id", "feature_name", "Count")
nuc_transcripts[[tma]] <- nuc_transcripts[[tma]] %>%
pivot_wider(names_from = "feature_name", values_from = "Count")
# Get blanks count per cell
blank_ids <- nuc_transcripts[[tma]]$cell_id
blank_mat <- nuc_transcripts[[tma]][, grep("BLANK",
colnames(nuc_transcripts[[tma]]))]
blank_counts <- as.data.frame(rowSums(blank_mat))
blank_counts$cell_id <- blank_ids
# Remove negative controls and convert to cell x gene matrix
nuc_transcripts[[tma]][, grep("NegControl",
colnames(nuc_transcripts[[tma]]),
invert = FALSE)]
nuc_transcripts[[tma]][, grep("BLANK",
colnames(nuc_transcripts[[tma]]),
invert = FALSE)]
nuc_transcripts[[tma]] <- nuc_transcripts[[tma]][, grep("NegControl",
colnames(nuc_transcripts[[tma]]),
invert = TRUE)]
nuc_transcripts[[tma]] <- nuc_transcripts[[tma]][, grep("BLANK",
colnames(nuc_transcripts[[tma]]),
invert = TRUE)]
keep_cells <- nuc_transcripts[[tma]]$cell_id
nuc_transcripts[[tma]] <- as.data.frame(nuc_transcripts[[tma]])
rownames(nuc_transcripts[[tma]]) <- keep_cells
nuc_transcripts[[tma]] <- nuc_transcripts[[tma]][, -1]
nuc_transcripts[[tma]] <- as.matrix(t(nuc_transcripts[[tma]]))
# Subset nuclear metadata to "cells" with transcripts that overlap nuclei
nuc_updated_metadata[[tma]] <- metadata[[tma]][metadata[[tma]]$cell_id %in% keep_cells, ]
# Add blank counts to metadata
nuc_updated_metadata[[tma]] <- full_join(nuc_updated_metadata[[tma]], blank_counts,
by = "cell_id")
nuc_updated_metadata[[tma]] <- nuc_updated_metadata[[tma]] %>%
dplyr::rename(num.blank = `rowSums(blank_mat)`)
rownames(nuc_updated_metadata[[tma]]) <- nuc_updated_metadata[[tma]]$cell_id
}
obj_list <- list()
obj_list <- sapply(tma_ids, function(XX) {
# Create a Seurat object containing the RNA adata
sobj <- CreateSeuratObject(counts = all_transcripts[[XX]],
assay = "RNA")
# Add metadata
sobj <- AddMetaData(sobj, metadata = updated_metadata[[XX]])
sobj$TMA <- XX
#sobj$tma <- tmas[[XX]]
#sobj$run <- run_ids[[XX]]
# Calculate percent blank
sobj$percent.blank <- sobj$num.blank/(sobj$nCount_RNA + sobj$num.blank)*100
# Remove cells with 0 nCount_RNA
sobj <- subset(sobj, subset = nCount_RNA != 0)
# Rename cells to add TMA name as prefix
cellid <- gsub("MR_PIPAC", "", XX)
cellid <- gsub("-Rerun", "", cellid)
sobj <- RenameCells(sobj, add.cell.id = cellid)
# Add spatial coordinates as dimension reduction objects
#position_xy <- cbind(sobj$adj_x_centroid, sobj$adj_y_centroid)
position_xy <- cbind(sobj$x_centroid, sobj$y_centroid)
row.names(position_xy) <- row.names(sobj@meta.data)
colnames(position_xy) <- c("SP_1", "SP_2")
sobj[["sp"]] <- CreateDimReducObject(embeddings = position_xy, key = "SP_",
assay = DefaultAssay(sobj))
obj_list[[XX]] <- sobj
})
#saveRDS(obj_list, "/scratch/hnatri/PIPAC/obj_list.rds")
# Nuclear only
nuc_obj_list <- list()
nuc_obj_list <- sapply(tma_ids, function(XX) {
# Create a Seurat object containing the RNA adata
sobj <- CreateSeuratObject(counts = nuc_transcripts[[XX]],
assay = "RNA")
# Add metadata
sobj <- AddMetaData(sobj, metadata = nuc_updated_metadata[[XX]])
sobj$TMA <- XX
#sobj$tma <- tmas[[XX]]
#sobj$run <- run_ids[[XX]]
# Calculate percent blank
sobj$percent.blank <- sobj$num.blank/(sobj$nCount_RNA + sobj$num.blank)*100
# Remove cells with 0 nCount_RNA
sobj <- subset(sobj, subset = nCount_RNA != 0)
# Rename cells to add TMA name as prefix
cellid <- gsub("MR_PIPAC", "", XX)
cellid <- gsub("-Rerun", "", cellid)
sobj <- RenameCells(sobj, add.cell.id = cellid)
# Add spatial coordinates as dimension reduction objects
#position_xy <- cbind(sobj$adj_x_centroid, sobj$adj_y_centroid)
position_xy <- cbind(sobj$x_centroid, sobj$y_centroid)
row.names(position_xy) <- row.names(sobj@meta.data)
colnames(position_xy) <- c("SP_1", "SP_2")
sobj[["sp"]] <- CreateDimReducObject(embeddings = position_xy, key = "SP_",
assay = DefaultAssay(sobj))
nuc_obj_list[[XX]] <- sobj
})
#saveRDS(nuc_obj_list, "/scratch/hnatri/PIPAC/nuc_obj_list.rds")
# Get TMA IDs
tma_ids <- names(obj_list)
nuc_obj_list <- readRDS("/scratch/hnatri/PIPAC/nuc_obj_list.rds")
# Merge objects (cannot do spatial DimPlots for this)
#merged_spatial_unfiltered <- nuc_obj_list[[1]]
merged_spatial_unfiltered <- merge(x = nuc_obj_list[[1]], y = nuc_obj_list[2:length(nuc_obj_list)])
#saveRDS(merged_spatial_unfiltered, "/scratch/hnatri/PIPAC/merged_spatial_unfiltered.rds")
merged_spatial_unfiltered <- readRDS("/scratch/hnatri/PIPAC/merged_spatial_unfiltered.rds")
# Add spatial dimension reduction object separately
position_xy <- cbind(merged_spatial_unfiltered$x_centroid,
merged_spatial_unfiltered$y_centroid)
row.names(position_xy) <- row.names(merged_spatial_unfiltered@meta.data)
colnames(position_xy) <- c("SP_1", "SP_2")
merged_spatial_unfiltered[["sp"]] <- CreateDimReducObject(
embeddings = position_xy, key = "SP_", assay = DefaultAssay(merged_spatial_unfiltered))
DimPlot(merged_spatial_unfiltered,
reduction = "sp",
group.by = "TMA",
label = F,
split.by = "TMA",
ncol = 3) +
coord_fixed() + NoLegend()
#saveRDS(merged_spatial_unfiltered, "/scratch/hnatri/PIPAC/merged_spatial_unfiltered.rds")
# Get sample IDs
#tma_ids <- names(obj_list)
#
#cell_obj_list <- list()
#cell_obj_list <- sapply(tma_ids, function(XX) {
# message(paste("Creating cell Seurat object for TMA", XX))
#
# # Create a Seurat object containing the RNA cell information
# sobj <- CreateSeuratObject(counts = counts[[XX]]$`Gene Expression`,
# assay = "RNA")
# rownames(metadata[[XX]]) <- metadata[[XX]]$cell_id
# sobj <- AddMetaData(sobj, metadata = metadata[[XX]])
#
# # Rename cells to add sample ID as prefix
# cellid <- gsub("MR_PIPAC", "", XX)
# cellid <- gsub("-Rerun", "", cellid)
# sobj <- RenameCells(sobj, add.cell.id = cellid)
#
# cell_obj_list[[XX]] <- sobj
#})
#
## Merge cell information
##cell_merged <- cell_obj_list[[1]]
#cell_merged <- merge(cell_obj_list[[1]], y = cell_obj_list[2:length(cell_obj_list)])
obj_list <- readRDS("/scratch/hnatri/PIPAC/obj_list.rds")
# Merge objects (cannot do spatial DimPlots for this)
#merged_spatial_unfiltered <- nuc_obj_list[[1]]
cell_merged <- merge(x = obj_list[[1]], y = obj_list[2:length(obj_list)])
# Add cell information to nuclei object
cell_merged <- JoinLayers(cell_merged)
cell_count_matrix <- cell_merged@assays$RNA$counts
keep_cells <- colnames(merged_spatial_unfiltered)
cell_count_matrix <- cell_count_matrix[, keep_cells]
merged_spatial_unfiltered[["cell_RNA"]] <- CreateAssayObject(counts = cell_count_matrix)
#saveRDS(merged_spatial_unfiltered, "/scratch/hnatri/PIPAC/merged_spatial_unfiltered.rds")
merged_spatial_unfiltered <- readRDS("/scratch/hnatri/PIPAC/merged_spatial_unfiltered.rds")
# Number of cells per TMA before filtering
summary(as.factor(merged_spatial_unfiltered$TMA))
MR_PIPACTMA1 MR_PIPACTMA2-Rerun MR_PIPACTMA3-Rerun MR_PIPACTMA4
161662 193899 238083 168497
MR_PIPACTMA5 MR_PIPACTMA6 MR_PIPACTMA7 MR_PIPACTMA8
176359 185303 254132 135124
MR_PIPACTMA9
136056
merged_spatial_unfiltered@meta.data %>%
ggplot(aes(x = TMA)) +
geom_bar(stat = "count") +
theme_minimal() +
RotatedAxis()
# Percent.blank
#merged_spatial_unfiltered@meta.data %>%
# ggplot(aes(x = percent.blank, fill = TMA)) +
# geom_histogram(bins = 50, show.legend = FALSE, color = "black") +
# theme_classic() +
# theme(title = element_text(color = "black"),
# axis.text = element_text(color = "black")) +
# facet_wrap(~TMA, scales = "free")
# nCount_RNA
merged_spatial_unfiltered@meta.data %>%
ggplot(aes(x = nCount_RNA, fill = TMA)) +
geom_histogram(bins = 50, show.legend = FALSE, color = "black") +
theme_classic() +
theme(title = element_text(color = "black"),
axis.text = element_text(color = "black")) +
facet_wrap(~TMA, scales = "free")
# nucleus_area
merged_spatial_unfiltered@meta.data %>%
ggplot(aes(x = nucleus_area, fill = TMA)) +
geom_histogram(bins = 50, show.legend = FALSE, color = "black") +
theme_classic() +
theme(title = element_text(color = "black"),
axis.text = element_text(color = "black")) +
facet_wrap(~TMA, scales = "free")
merged_spatial_unfiltered$TMA <- factor(merged_spatial_unfiltered$TMA,
levels = rev(sort(unique(merged_spatial_unfiltered$TMA))))
#BetterVlnPlot(merged_spatial_unfiltered,
# features = "percent.blank", group_by = "TMA")
BetterVlnPlot(merged_spatial_unfiltered,
features = "nCount_RNA", group_by = "TMA")
BetterVlnPlot(merged_spatial_unfiltered,
features = "nFeature_RNA", group_by = "TMA")
BetterVlnPlot(merged_spatial_unfiltered,
features = "nucleus_area", group_by = "TMA")
BetterVlnPlot(merged_spatial_unfiltered,
features = "nCount_cell_RNA", group_by = "TMA")
BetterVlnPlot(merged_spatial_unfiltered,
features = "nFeature_cell_RNA", group_by = "TMA")
## nCount_RNA vs. percent.blank
#smoothScatter(merged_spatial_unfiltered@meta.data$percent.blank,
# log(merged_spatial_unfiltered@meta.data$nCount_RNA),
# cex = 0.5, pch = 16)
#abline(v = 4, h = log(12), lty = "dashed", col = "black")
#text(5, 5, col = "black", adj = c(0, -.1),
# "nCount_RNA >= 12 & percent.blank <= 4")
#
## nFeature_RNA vs. percent.blank
#smoothScatter(merged_spatial_unfiltered@meta.data$percent.blank,
# log(merged_spatial_unfiltered@meta.data$nFeature_RNA),
# cex = 0.5, pch = 16)
#abline(v = 4, h = log(10), lty = "dashed", col = "black")
#text(5, 4, col = "black", adj = c(0, -.1),
# "nFeature_RNA >= 10 & percent.blank <= 4")
# nCount_RNA vs. nFeature_RNA
smoothScatter(log(merged_spatial_unfiltered$nCount_RNA),
log(merged_spatial_unfiltered$nFeature_RNA),
cex = 0.5, pch = 16)
abline(v = log(10), h = log(10), lty = "dashed", col = "black")
text(0.3, 4.6, col = "black", adj = c(0, -.1),
"nCount_RNA >= 10 & nFeature_RNA >= 10")
smoothScatter(merged_spatial_unfiltered$nCount_RNA,
merged_spatial_unfiltered$nFeature_RNA,
cex = 0.5, pch = 16)
abline(v = log(10), h = log(10), lty = "dashed", col = "black")
text(0.3, 4.6, col = "black", adj = c(0, -.1),
"nCount_RNA >= 10 & nFeature_RNA >= 10")
# nCount RNA vs. nucleus_area
smoothScatter(merged_spatial_unfiltered$nucleus_area,
log(merged_spatial_unfiltered$nCount_RNA),
cex = 0.5, pch = 16)
abline(v = c(5, 80), h = log(10), lty = "dashed", col = "black")
text(120, 0.7, col = "black", adj = c(0, -.1),
"nCount_RNA >= 10 & nucleus_area between 5-80")
# nFeature RNA vs. nucleus_area
smoothScatter(merged_spatial_unfiltered$nucleus_area,
log(merged_spatial_unfiltered$nFeature_RNA),
cex = 0.5, pch = 16)
abline(v = c(5, 80), h = log(10), lty = "dashed", col = "black")
text(120, 0.4, col = "black", adj = c(0, -.1),
"nFeature_RNA >= 10 & & nucleus_area between 5-80")
# All transcripts
# nCount_RNA vs. nFeature_RNA
smoothScatter(log(merged_spatial_unfiltered$nCount_cell_RNA),
log(merged_spatial_unfiltered$nFeature_cell_RNA),
cex = 0.5, pch = 16)
abline(v = log(15), h = log(15), lty = "dashed", col = "black")
text(0.3, 4.6, col = "black", adj = c(0, -.1),
"nCount_cell_RNA >= 15 & nFeature_cell_RNA >= 15")
smoothScatter(merged_spatial_unfiltered$nCount_cell_RNA,
merged_spatial_unfiltered$nFeature_cell_RNA,
cex = 0.5, pch = 16)
abline(v = log(15), h = log(15), lty = "dashed", col = "black")
text(0.3, 4.6, col = "black", adj = c(0, -.1),
"nCount_cell_RNA >= 15 & nFeature_cell_RNA >= 15")
# Filter merged and individual data
merged_spatial <- subset(merged_spatial_unfiltered,
subset = nCount_cell_RNA >= 15 & nFeature_cell_RNA >= 15 &
#percent.blank <= 5 &
nucleus_area >= 5 & nucleus_area <= 80)
# Number of nuclei before and after filtering
bf_cells <- table(merged_spatial_unfiltered$TMA)
aft_cells <- table(merged_spatial$TMA)
diff_cells <- bf_cells - aft_cells
prop_kept_cells <- round(aft_cells/bf_cells*100, 2)
prop_kept_cells
MR_PIPACTMA9 MR_PIPACTMA8 MR_PIPACTMA7 MR_PIPACTMA6
55.37 49.30 55.74 59.32
MR_PIPACTMA5 MR_PIPACTMA4 MR_PIPACTMA3-Rerun MR_PIPACTMA2-Rerun
41.15 48.81 58.27 57.79
MR_PIPACTMA1
48.38
# DimPlots before and after filtering
DimPlotCompare("MR_PIPACTMA1")
DimPlotCompare("MR_PIPACTMA2-Rerun")
DimPlotCompare("MR_PIPACTMA3-Rerun")
DimPlotCompare("MR_PIPACTMA4")
DimPlotCompare("MR_PIPACTMA5")
DimPlotCompare("MR_PIPACTMA6")
DimPlotCompare("MR_PIPACTMA7")
DimPlotCompare("MR_PIPACTMA8")
DimPlotCompare("MR_PIPACTMA9")
# Saving the object for clustering with ScanPy
merged_spatial <- JoinLayers(merged_spatial)
#saveRDS(merged_spatial, "/scratch/hnatri/PIPAC/merged_spatial_filtered.rds")
# To build on command line, run Rscript -e "rmarkdown::render('Xenium_processing_wholecell.Rmd')"
# Then "mv *.html /home/hnatri/PIPAC_Xenium/docs/"
sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggrepel_0.9.3 ggpubr_0.6.0 lubridate_1.9.2
[4] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
[7] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[10] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
[13] SeuratDisk_0.0.0.9021 Seurat_5.0.1 SeuratObject_5.0.1
[16] sp_1.6-1 arrow_12.0.0 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.14 jsonlite_1.8.5
[4] magrittr_2.0.3 ggbeeswarm_0.7.2 spatstat.utils_3.0-3
[7] farver_2.1.1 rmarkdown_2.22 fs_1.6.2
[10] vctrs_0.6.2 ROCR_1.0-11 spatstat.explore_3.2-1
[13] rstatix_0.7.2 htmltools_0.5.5 broom_1.0.4
[16] sass_0.4.6 sctransform_0.4.1 parallelly_1.36.0
[19] KernSmooth_2.23-21 bslib_0.4.2 htmlwidgets_1.6.2
[22] ica_1.0-3 plyr_1.8.8 plotly_4.10.2
[25] zoo_1.8-12 cachem_1.0.8 whisker_0.4.1
[28] igraph_1.4.3 mime_0.12 lifecycle_1.0.3
[31] pkgconfig_2.0.3 Matrix_1.6-5 R6_2.5.1
[34] fastmap_1.1.1 fitdistrplus_1.1-11 future_1.32.0
[37] shiny_1.7.4 digest_0.6.31 colorspace_2.1-0
[40] patchwork_1.1.2 ps_1.7.5 rprojroot_2.0.3
[43] tensor_1.5 RSpectra_0.16-1 irlba_2.3.5.1
[46] labeling_0.4.2 progressr_0.13.0 timechange_0.2.0
[49] fansi_1.0.4 spatstat.sparse_3.0-1 httr_1.4.6
[52] polyclip_1.10-4 abind_1.4-5 compiler_4.3.0
[55] withr_2.5.0 bit64_4.0.5 backports_1.4.1
[58] carData_3.0-5 fastDummies_1.7.3 highr_0.10
[61] ggsignif_0.6.4 MASS_7.3-60 tools_4.3.0
[64] vipor_0.4.5 lmtest_0.9-40 beeswarm_0.4.0
[67] httpuv_1.6.11 future.apply_1.11.0 goftest_1.2-3
[70] glue_1.6.2 callr_3.7.3 nlme_3.1-162
[73] promises_1.2.0.1 grid_4.3.0 Rtsne_0.16
[76] getPass_0.2-4 cluster_2.1.4 reshape2_1.4.4
[79] generics_0.1.3 hdf5r_1.3.8 gtable_0.3.3
[82] spatstat.data_3.0-1 tzdb_0.4.0 hms_1.1.3
[85] data.table_1.14.8 car_3.1-2 utf8_1.2.3
[88] spatstat.geom_3.2-1 RcppAnnoy_0.0.20 RANN_2.6.1
[91] pillar_1.9.0 spam_2.9-1 RcppHNSW_0.5.0
[94] later_1.3.1 splines_4.3.0 lattice_0.21-8
[97] survival_3.5-5 bit_4.0.5 deldir_1.0-9
[100] tidyselect_1.2.0 miniUI_0.1.1.1 pbapply_1.7-0
[103] knitr_1.43 git2r_0.32.0 gridExtra_2.3
[106] scattermore_1.2 xfun_0.39 matrixStats_1.0.0
[109] stringi_1.7.12 lazyeval_0.2.2 yaml_2.3.7
[112] evaluate_0.21 codetools_0.2-19 cli_3.6.1
[115] uwot_0.1.14 xtable_1.8-4 reticulate_1.29
[118] munsell_0.5.0 processx_3.8.1 jquerylib_0.1.4
[121] Rcpp_1.0.10 globals_0.16.2 spatstat.random_3.1-5
[124] png_0.1-8 ggrastr_1.0.2 parallel_4.3.0
[127] ellipsis_0.3.2 assertthat_0.2.1 dotCall64_1.0-2
[130] listenv_0.9.0 viridisLite_0.4.2 scales_1.2.1
[133] ggridges_0.5.4 crayon_1.5.2 leiden_0.4.3
[136] rlang_1.1.1 cowplot_1.1.1